Accuracy of non-physician health workers in respiratory rate measurement to identify paediatric pneumonia in low- and middle-income countries: A systematic review and meta-analysis

Background Non-physician health workers play an important role in identifying and treating pneumonia in children in low- and middle-income countries (LMICs). In this systematic review, we summarized the evidence on whether health workers can accurately measure respiratory rate (RR) and identify fast breathing to diagnose pneumonia in children under five years of age. Methods We searched MEDLINE, EMBASE, Web of Science, and Scopus from January 1990 to August 2020 without any language restrictions. Reference lists of included studies were also screened for additional records. Studies evaluating the performance of health workers in measuring RR and/or identifying fast breathing compared to a reference standard were included. The methodological quality of the included studies was assessed using the QUADAS-2 tool. A meta-analysis was conducted to report pooled estimates of sensitivity and specificity. Hierarchical summary receiver operating characteristic curve (HSROC) models were fitted, and subgroup and sensitivity analyses were performed to examine the effects of study variables. Results We included 16 studies, eight of which reported the agreement in RR count between health workers and a reference standard. The median agreements were 39%, 47%, and 67% within ±2, ±3, and ±5 breaths per minute, respectively. Among the 16 included studies, we identified 15 studies that reported the accuracy of a health worker classifying breathing into either fast or normal categories compared to a reference standard. The median sensitivity, specificity, accuracy, and kappa value were 77%, 86%, 81%, and 0.75, respectively. Seven studies reporting the accuracy of identifying fast breathing were included in the meta-analysis. The pooled estimates of sensitivity and specificity were 78% (95% CI = 72-82) and 86% (95% CI = 78-91), respectively. Conclusions Despite the problematic nature of reference standards and their variability across studies, our review suggests that the health worker performance in accurately counting RR is relatively poor. However, their performance shows reasonable specificity and moderate sensitivity in identifying fast breathing. Improving the detection of fast breathing in children with suspected pneumonia among health workers is an important child health programme objective and should be given appropriate priority.

Background Non-physician health workers play an important role in identifying and treating pneumonia in children in low-and middle-income countries (LMICs). In this systematic review, we summarized the evidence on whether health workers can accurately measure respiratory rate (RR) and identify fast breathing to diagnose pneumonia in children under five years of age.
burden on health services and is a major cause of hospital admissions in children [4]. In LMICs, the recognition of pneumonia and care-seeking behaviour is generally poor [5]. An important factor limiting the effective diagnosis and treatment of pneumonia in LMICs is a lower doctor-to-population ratio [6]. Moreover, doctors and hospitals are usually more difficult to access [7,8], and the cost of treatment is often prohibitive for caregivers [9]. Therefore, a significant proportion of pneumonia is diagnosed and treated outside hospitals by non-physician health workers [10]. During household visits or community health centre patient encounters, these health workers apply pragmatic case management algorithms to make decisions on diagnosis, treatment, and referral of children suspected to have pneumonia [11,12]. Community-based management of pneumonia by health workers has had a substantial effect on reducing child mortality [13].
According to the World Health Organisation (WHO) guidelines, pneumonia diagnosis in children is primarily based on increased respiratory rate (RR). The number of breaths is manually counted for 60 seconds using an acute respiratory illness (ARI) timer or a watch and is then classified as fast or normal breathing according to the child's respective age group [14,15]. The measurement of RR is challenging however, and is frequently miscounted, often due to movement of the child or shallow, irregular breathing. Counting of RR is often not done routinely by health workers as it is difficult, time-consuming, and depends on the availability of timers. Moreover, a clear definition of a breath is not available within WHO guidelines [16]. This has implications for the quality of clinical practice, as it can lead to under-diagnosis, misdiagnosis, and insufficient or inappropriate treatment [17][18][19].
The diagnosis of pneumonia in LMICs largely depends on health workers' ability to count RR and classify fast and normal breathing accurately. Despite existing literature evaluating the ability of health workers to count and classify fast breathing pneumonia, to our knowledge the evidence has not yet been systematically collated. As the existing literature involves studies with small numbers, a systematic review would allow more robust evidence to inform clinical practice and policy implementation. In this review, we summarized the evidence on whether health workers can accurately measure RR and identify fast breathing in children under five years of age.

METHODS
We conducted this systematic review following the methodology described in the Handbook for Diagnostic Test Accuracy (DTA) Reviews of Cochrane [20]. We used the Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) 2020 [21] and the Preferred Reporting Items for Systematic Reviews and Meta-analyses of Diagnostic Test Accuracy Studies (PRISMA-DTA) [22] in reporting our findings. The review protocol was registered with the PROSPERO database (registration number CRD42020211127).

Population, index test, reference standard, and target condition
The target participants were children under five years of age who had their RR assessed in the community or when attending a health facility. The index test was counting RR and/or assessing fast breathing manually by non-physician health workers. RR counting and/or fast breathing identification by a human expert or an automated device were considered reference standards. The experts were experienced paediatricians, clinicians or other persons who were trained in clinical algorithms of pneumonia in children.

Search strategy
We developed a search strategy using a combination of medical subject headings (MeSH) and keywords related to the topic. The key concepts were "pneumonia" AND "respiratory rate" AND "accuracy" AND "children under five years of age". We comprehensively searched MEDLINE (via Ovid), EMBASE (via Ovid), Web of Science, and Scopus databases. The detailed search strategy used for each database is reported in Table S1 in the Online Supplementary Document. Included studies were published between January 1st, 1990, to August 9th, 2020. We sought to identify other potentially relevant studies by subjecting all included studies to a forward citation search and examined their reference lists. There were no restrictions on language in the searches. An expert librarian verified the search strategy.

Study eligibility
Studies were included if they met the following criteria: 1. Measurement of RR and/or identification of fast breathing were done manually by non-physician health workers.

Study selection and data extraction
We downloaded the literature search results from different databases into the EndNote X9 reference management software. After excluding duplicates, two review authors (AMK and AOD) independently examined the titles and/or abstracts of the identified studies and excluded irrelevant studies. They then independently analysed the full texts of potentially relevant articles according to the pre-specified eligibility criteria. Disagreements were resolved through discussion between the two reviewers.
The review authors extracted data from studies using a structured checklist (Table S3 in the Online Supplementary Document) and entered those into the Microsoft Excel spreadsheet. Any disagreements were resolved through discussion.

Quality assessment
Both reviewers used the Quality Assessment of Diagnostic Accuracy Studies 2 (QUADAS-2) tool [24] to assess the quality of the included studies. Four domains (ie, patient selection, index test, reference standard, and flow and timing of the participants) were assessed for risk of bias. There are some core signalling questions under each domain. The answer to each signalling question was "yes", "no" or "unclear", and the risk of bias was considered as "low", "high" or "unclear". The "unclear" category was used only when insufficient data were reported. Individual domain was considered "low risk" if the answers to all signalling questions were "yes"; "high risk" if at least one answer was "no" in any combination; and "unclear" where at least one answer was "unclear", the other was "yes" and where no answer was "no" in any combination. Both review authors checked the risk of bias independently and any disagreement was settled through discussion. We entered these data into Review Manager (version 5.3) to create the figure used in this paper.

Data synthesis and analysis
For the studies reporting agreement of RR counts between health workers and the reference standard, we presented the percent agreement and calculated median agreement with the range of values. For the studies reporting accuracy of classifying fast and normal breathing compared to a reference standard, we presented sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), accuracy, and kappa value of individual study if data were available, and we calculated median values with ranges.
We performed a meta-analysis with those studies reporting classification of fast and normal breathing where true positive (TP), false positive (FP), false negative (FN), and true negative (TN) data could be retrieved. We estimated sensitivity and specificity with 95% confidence intervals (CI) for each study and presented those in paired forest plots to inspect the study variance. We fitted hierarchical summary receiver operating curve (HSROC) models [25] using user-written modules (metandi, midas) [26,27] in the Stata statistical software (version 16.0) to assess accuracy of fast breathing identification. Heterogeneity among studies was evaluated visually, from coupled forest plot, and statistically, using the I-square [28]. We used univariate meta-regression to perform subgroup analyses. The parameters for subgroup analysis were as follows: child age, study setting, fast breathing prevalence in the sample, diagnosing health worker, and timing of RR measurement by index test and reference standard.
We performed a sensitivity analysis restricting the analysis to studies where fast breathing was defined using WHO RR thresholds. We did not conduct tests for reporting bias due to ambiguity of the factors of publication bias for diagnostic accuracy studies and the inadequacy of tests for identifying asymmetry of funnel plot [29].

Methodological quality of included studies
The assessment of methodological quality is presented in Figure 2. In general, the risk of bias was low or unclear. For patient selection, we evaluated four studies as having a high risk of bias because of non-consecutive or non-random sample selection [17,19,36,40], six studies as having unclear risk of bias because of a poorly described sampling method [18,[37][38][39]43] or exclusion criteria [42]. For the index test, we evaluated all studies as having a low risk of bias because the health workers of all studies were blinded to the result of the reference standard, and a pre-specified threshold was used to classify fast breathing. For the reference standard, we evaluated four studies as having a high risk of bias because two studies did not use the WHO RR threshold to classify fast breathing [33,34], in two studies, reference standard was unblinded [40,41] and seven studies having unclear risk of bias because of poor reporting on blinding [18,19,31,41,42] and qualification of the experts [17,[34][35][36]40]. For patient flow and timing assessment, we deemed three studies to have a high risk of bias. Among these, a long delay between index test and reference standard was present in two studies [31,40], and one study excluded a certain number of patients from the analysis without proper reporting [36]. Most of the studies had low concerns regarding applicability for all domains. The main concerns were related to inclusion criteria for patient selection in one study [40] and inappropriate classification of fast breathing for reference standard in two studies [33,34]. Overall, concerns regarding the applicability of the results were low. Table 2 presents the summary findings for the eight studies reporting the agreement in RR count between health workers and reference standards. Definitions of agreement in RR count varied across studies. Table 3 shows that the overall median agreements of the health workers were 39%, 47%, and 67% within ±2 bpm, ±3 bpm, and ±5 bpm of reference standards, respectively. The agreements of RR in terms of age groups, settings, types of health workers, and types of reference standards are also presented.

Agreement in respiratory rate count between health workers and reference standard
The agreement of RR counts between health workers and a reference standard was presented using the Bland Altman plots in two studies. Baker et al. [30] reported a wide variation in readings especially in the younger children. The mean difference was -0.6 bpm with limits of agreement (LOAs) from -25.4 to 23.9 bpm [30]. Sinyangwe et al. [43] reported the mean difference of -0.74 bpm with LOAs -18.8 to 17.3 bpm. Health workers over-counted RR than the reference standard in general but undercounted in children with higher RR.

Accuracy in fast breathing identification by health workers compared to reference standard
The summary results of the 15 included studies reporting accuracy of classification of fast and normal breathing compared to a reference standard are presented in Table 4. The accuracy of fast breathing identification differed in different age groups. The agreement was comparatively lower in children aged 0-2 months compared to older children. The accuracy of fast breathing identification was lower in children with uncomplicated illness, in comparison to children with severe illness.
The median sensitivity, specificity, PPV, NPV, accuracy, and kappa value are presented in Table 5. The overall median sensitivity, specificity, and accuracy of classification of fast breathing were 77%, 86%, and 81%, respectively. The median sensitivity was marginally higher in children aged 0-2 months, and median specificity was slightly higher in children aged 2-59 months. The median sensitivity was higher in studies conducted in community settings whereas the mean specificity was higher in studies conducted in health facilities. Although similar sensitivities, the specificity was higher in facility-based health workers compared to community-based health workers. The median sensitivity was slightly higher when RR was measured simultaneously by the health  [17] Children with fast breathing and children with normal breathing 0-59 409/576 = 71% Kalyango, 2012 [37] Children with any acute illness presented at the health facility 4-59 39% 49% Langston, 2019 [39] Children with acute respiratory problem presented at hospital 2-59 54% & 49% Miller, 2014 [19] Children with acute illness at rural health posts or household 2-59 91/130 = 70% Mukanga, 2011 [18] Children

Results of meta-analysis
Individual and summary estimates of sensitivity and specificity with 95% CI for all the studies included in the meta-analysis are presented in Figure 3. The pooled sensitivity was 78% (95% CI = 72-82), the pooled spec-   ificity was 86% (95% CI = 78-91), and there was considerable heterogeneity (I 2 = 72%). Figure 4 depicts the hierarchical summary receiver curve (HSROC) plot of sensitivity and specificity with summary point, summary estimates, 95% confidence region and 95% prediction region for all studies included in the meta-analysis. Table 6 presents subgroup analysis according to child age, study settings, types of health workers, timing of assessment, and prevalence of fast breathing using univariate meta-regression.
We conducted a sensitivity analysis excluding the study where the WHO RR threshold was not used to classify fast breathing to explore whether this could affect overall results ( Figure S1 in the Online Supplementary  Document). Based on the studies included in sensitivity analysis, the pooled sensitivity of fast breathing identification by health workers was 78% (95% CI = 72-83) which was almost similar to the results of the primary meta-analysis (where all studies were included); however, the pooled specificity slightly increased to 87% (95% CI = 81-92). Table 6. Subgroup analysis of sensitivity and specificity of health worker classification of fast and normal breathing compared to a reference standard

DISCUSSION
This systematic review demonstrated that the performance of health workers in the measurement of RR and identification of fast breathing varied across the studies. Overall performance in classifying fast and normal breathing was moderate, with sensitivity ranging from 61% to 88% and a pooled estimate of 78% from the meta-analysis. As the sensitivity is moderate, a significant number of children may have a missed diagnosis of fast breathing, potentially leading to poor outcomes [44]. Some of these children may also have had other clinical signs of respiratory distress like lower chest wall indrawing that could have been identified, resulting in a true pneumonia case detection rate higher than these estimates. Further research is needed to investigate possible causes behind the inconsistency in diagnoses between health workers and reference standards and to elicit the difficulties encountered by the health workers, thus improving sensitivity.
The specificity of the studies ranged from 69% to 91%, with a meta-estimate of 86%, demonstrating consistency in exclusion of a diagnosis of fast breathing pneumonia when the disease is not present. This is potentially encouraging, as it may imply, that if these guidelines are followed and RR counting is consistently applied during patient care, then few children would receive antibiotics unnecessarily, which could mitigate inappropriate use of antibiotics [44]. It also means there is minimal unwarranted distress and economic cost for caregivers who would wrongly believe their child has pneumonia [8].
Although there was a moderate agreement in identifying fast breathing, the agreement in RR count between health workers and reference standards was relatively poor. The level of agreement was inconsistent across the studies. The median agreements were 39%, 47%, and 67% within ±2 bpm, ±3 bpm, and ±5 bpm, respectively. It is worth mentioning that obtaining good agreement on RR counts is challenging, even between experts [45]. The difference in RR counts between two observers often does not change the diagnosis. Therefore, classification of RR into fast and normal breathing would be better than the continuous RR count agreements to evaluate the performance of health workers considering its clinical relevance.
The review found that the agreement in RR count was poor in children aged 0-2 months compared to the older children. The health workers may find it easier to count RR when it is slower in older children compared to when it is fast in younger children [46]. Interestingly, the review found that, although the specificity of fast breathing identification was higher in children aged 2-59 months, the sensitivity was higher in children aged 0-2 months. However, this finding for identifying fast breathing in newborns was based on two studies only. Sensitivity was also found to be slightly higher in infants compared to older children. More studies evaluating the accuracy of RR measurement and fast breathing identification in newborns and infants would be required to confirm this.
Community-based health workers performed better at counting RR and identifying fast breathing compared to facility-based workers. This might be due to community-based workers are usually recruited and trained for a specific program. They usually assess similar signs and symptoms repeatedly, give more time to do an assessment, develop better skills on assessing those specific signs and symptoms and so become more experienced despite being lower cadres [47]. On the other hand, facility-based workers must deal with different types of patients with a wide range of signs and symptoms. The sensitivity was higher in the studies conducted in the community settings compared to facility settings. The crowded and busy environment of the health facilities in LMICs might influence the performance of the health workers [48].
The interval between health worker assessment and reference standard assessment is also important in evaluating the performance of health workers. The review demonstrates marginally higher sensitivity when both assessments were done simultaneously compared to a short or long delay. The RR can change over a period of time and this variability may affect sensitivity and specificity in identifying fast breathing [45]. Therefore, simultaneous measurement of RR by a health worker and a reference standard should be ideal. A short delay is not a valid reference standard for comparing RR but may be fair for comparing a binary pneumonia diagnosis. A prolonged period between the two measurements should be avoided.
The absence of an appropriate reference standard to evaluate the performance of health workers is a challenge. Most of the included studies used manual RR count by an expert as the reference standard. An expert is assumed to be more correct. However, the expert can over-count or under-count breaths. Therefore, using expert counting as a reference standard itself poses challenges due to uncertain accuracy. The possible biases using human expert count as the reference standard includes the difficulty in measuring the RR over the same simultaneous period and inconsistencies in human expert RR counting. One study used capnography reference, an automated method using carbon dioxide (CO 2 ) in exhaled air to extract RR [49]. However, the validity of using capnography in measuring RR in field-setting is yet to be established. The videography of child assessment and interpretation of the videos by an expert panel could be recommended as a reference standard for future studies [50,51].
There were several limitations to this review. First, most of the studies included in this review were conducted in Africa, while only two were conducted in Asia, and one in Oceania. Therefore, the review findings might not be generalizable across LMICs. Second, RR was often measured by health workers as a part of a larger study. The study may not have provided sufficient information about the methods of measurement and comprehensive results. Third, in most studies, a varying level of training was provided to the health worker before their assessment. This could impact the results of this review [52]. This also raises the question of whether the results of these studies assess health workers' performance in their day-to-day environments instead of their competency after training. Performance of health workers during the study might not accurately reflect their day-to-day performance; it may also decay over time from training. Fourthly, most of the studies used an expert person as the reference standard who observed the assessment performed by the health workers. The performance of health workers might increase due to the observation compared to when conducting their usual day-to-day activities. This means that the findings would reflect a best case scenario of accuracy and in the real world context it might expected to be even worse [53]. Fifth, different studies used different definitions of RR agreement ranging from two to five bpm between health workers and the reference standards. Therefore, it was not possible to combine the findings of all studies that reported agreement of RR measurement. Sixth, we have discussed some factors responsible for the variability of performance of health workers across the studies. There could be quite a few more contributing factors. Finally, we could not include some studies in this review that assessed health workers' performance, including diagnosis and management of pneumonia which would involve measuring RR and classifying fast breathing. It was unclear whether these outcomes were measured or measured and not reported. Moreover, we could not include some studies in the meta-analysis because TP, FP, FN, and TN data were missing in the reports, or these were not possible to retrieve.
Despite these limitations, this review provides evidence on the need of strengthening the performance of health workers to measure RR and identify fast breathing pneumonia. Counting RR is the cornerstone to diagnosis of pneumonia in children, but it is rarely practised in the field during real-world care [54]. The performance of health workers could be enhanced by improved training, supportive supervision, ongoing performance monitoring and feedback [55]. Counting RR manually is challenging, often resulting in inaccurate diagnosis. Therefore, development of improved pneumonia diagnostic aids, such as a validated automated RR counters appropriate for use by health workers might improve the diagnosis of pneumonia in LMICs [56]. Appropriate methods including a non-biased reference standard should be used to evaluate the accuracy of health workers' RR counts. Further implementation research could help define what the best approach for improving their performance.

CONCLUSIONS
This review showed that the accuracy of RR measurement by non-physician health workers varied across the studies. While they could measure RR and identify fast breathing pneumonia with a moderate sensitivity and reasonable specificity, there is still a need for the improvement of RR measurement and identification of fast-breathing pneumonia by these health workers. This could be done by improved training, ongoing supervision, audit of performance and improved diagnostic aids to measure RR and classifying fast breathing accurately. The contribution of well-trained and well-equipped health workers is valuable in LMICs, where it is not always feasible for a child to see a doctor. This should decrease the burden on scarce doctors and health centres in LMICs and may help reduce morbidity and mortality associated with pneumonia.